Analysis of spontaneous Japanese in a multi-language telephone-speech corpus

نویسندگان

  • Takayuki Arai
  • Natasha Warner
  • Steven Greenberg
چکیده

Takayuki Arai , Natasha Warner and Steven Greenberg Department of Electrical and Electronics Engineering, Sophia University, 7–1 Kioi-cho, Chiyoda-ku, Tokyo, 102–8554 Japan Department of Linguistics, University of Arizona, PO Box 210028, Tucson, AZ 85721–0028, USA Silicon Speech, 46 Oxford Drive, Santa Venetia, CA 94903, USA; Centre for Applied Hearing Research, Technical University of Denmark, Kgs. Lyngby, DK-2800, Denmark

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The OGI multi-language telephone speech corpus

The OGI Multi-language Telephone Speech Corpus is designed to support research on automatic language identi cation and multi-language speech recognition. The corpus consists of up to nine separate responses from each caller, ranging from single words to short topic-speci c descriptions to 60 seconds of unconstrained spontaneous speech. The utterances were spoken over commercial telephone lines ...

متن کامل

Selection of Multi-Word Expressions from Web N-gram Corpus for Speech Recognition

This paper proposes a method for constructing a statistical language model with multi word expressions (MWEs) selected from Google Japanese Web N-gram. MWEs are concatenated words that consist of idiomatic expressions or long-length morpheme sequences used frequently. In this paper a method for selecting the effective MWEs that improve the language model based on co-occurrence probabilities of ...

متن کامل

Spontaneous Speech Corpus of Japanese

Design issues of a spontaneous speech corpus is described. The corpus under compilation will contain 800-1000 hour spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but we plan ...

متن کامل

Analysis of Language Variation Using a Large-Scale Corpus of Spontaneous Speech

Large-scale corpus of spontaneous speech can be a powerful tool for the study of language variation. Moreover, given that the corpus is publicly available, corpus-based analysis could open up the possibility of follow-up analysis in this area of linguistic study. Generally speaking, follow-up study is highly desirable in sciences but so far it has been virtually impossible in the area of socio-...

متن کامل

Automatic Estimation of Speaking Rate in Multilingual Spontaneous Speech

An automatic estimation of speaking rate is developed in this paper. It is based on an unsupervised vowel detection algorithm and thus may be costlessly applied to any language. Validation is driven on a spontaneous speech subset of the OGI Multilingual Telephone Speech Corpus. The correlation coefficient between the estimated and real speaking rates (evaluated in term of vowel-per-second rates...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006